Goto

Collaborating Authors

 natural science


A Perspective on Symbolic Machine Learning in Physical Sciences

Makke, Nour, Chawla, Sanjay

arXiv.org Artificial Intelligence

Machine learning is rapidly making its pathway across all of the natural sciences, including physical sciences. The rate at which ML is impacting non-scientific disciplines is incomparable to that in the physical sciences. This is partly due to the uninterpretable nature of deep neural networks. Symbolic machine learning stands as an equal and complementary partner to numerical machine learning in speeding up scientific discovery in physics. This perspective discusses the main differences between the ML and scientific approaches. It stresses the need to develop and apply symbolic machine learning to physics problems equally, in parallel to numerical machine learning, because of the dual nature of physics research.


Examining the Behavior of LLM Architectures Within the Framework of Standardized National Exams in Brazil

Locatelli, Marcelo Sartori, Miranda, Matheus Prado, Costa, Igor Joaquim da Silva, Prates, Matheus Torres, Thomé, Victor, Monteiro, Mateus Zaparoli, Lacerda, Tomas, Pagano, Adriana, Neto, Eduardo Rios, Meira, Wagner Jr., Almeida, Virgilio

arXiv.org Artificial Intelligence

The Exame Nacional do Ensino M\'edio (ENEM) is a pivotal test for Brazilian students, required for admission to a significant number of universities in Brazil. The test consists of four objective high-school level tests on Math, Humanities, Natural Sciences and Languages, and one writing essay. Students' answers to the test and to the accompanying socioeconomic status questionnaire are made public every year (albeit anonymized) due to transparency policies from the Brazilian Government. In the context of large language models (LLMs), these data lend themselves nicely to comparing different groups of humans with AI, as we can have access to human and machine answer distributions. We leverage these characteristics of the ENEM dataset and compare GPT-3.5 and 4, and MariTalk, a model trained using Portuguese data, to humans, aiming to ascertain how their answers relate to real societal groups and what that may reveal about the model biases. We divide the human groups by using socioeconomic status (SES), and compare their answer distribution with LLMs for each question and for the essay. We find no significant biases when comparing LLM performance to humans on the multiple-choice Brazilian Portuguese tests, as the distance between model and human answers is mostly determined by the human accuracy. A similar conclusion is found by looking at the generated text as, when analyzing the essays, we observe that human and LLM essays differ in a few key factors, one being the choice of words where model essays were easily separable from human ones. The texts also differ syntactically, with LLM generated essays exhibiting, on average, smaller sentences and less thought units, among other differences. These results suggest that, for Brazilian Portuguese in the ENEM context, LLM outputs represent no group of humans, being significantly different from the answers from Brazilian students across all tests.


Evaluating Open Language Models Across Task Types, Application Domains, and Reasoning Types: An In-Depth Experimental Analysis

Sinha, Neelabh, Jain, Vinija, Chadha, Aman

arXiv.org Artificial Intelligence

The rapid rise of Language Models (LMs) has expanded their use in several applications. Yet, due to constraints of model size, associated cost, or proprietary restrictions, utilizing state-of-the-art (SOTA) LLMs is not always feasible. With open, smaller LMs emerging, more applications can leverage their capabilities, but selecting the right LM can be challenging. This work conducts an in-depth experimental analysis of the semantic correctness of outputs of 10 smaller, open LMs across three aspects: task types, application domains and reasoning types, using diverse prompt styles. We demonstrate that most effective models and prompt styles vary depending on the specific requirements. Our analysis provides a comparative assessment of LMs and prompt styles using a proposed three-tier schema of aspects for their strategic selection based on use-case and other constraints. We also show that if utilized appropriately, these LMs can compete with, and sometimes outperform, SOTA LLMs like DeepSeek-v2, GPT-3.5-Turbo, and GPT-4o.


Is machine learning good or bad for the natural sciences?

Hogg, David W., Villar, Soledad

arXiv.org Machine Learning

Machine learning (ML) methods are having a huge impact across all of the sciences. However, ML has a strong ontology - in which only the data exist - and a strong epistemology - in which a model is considered good if it performs well on held-out training data. These philosophies are in strong conflict with both standard practices and key philosophies in the natural sciences. Here we identify some locations for ML in the natural sciences at which the ontology and epistemology are valuable. For example, when an expressive machine learning model is used in a causal inference to represent the effects of confounders, such as foregrounds, backgrounds, or instrument calibration parameters, the model capacity and loose philosophy of ML can make the results more trustworthy. We also show that there are contexts in which the introduction of ML introduces strong, unwanted statistical biases. For one, when ML models are used to emulate physical (or first-principles) simulations, they amplify confirmation biases. For another, when expressive regressions are used to label datasets, those labels cannot be used in downstream joint or ensemble analyses without taking on uncontrolled biases. The question in the title is being asked of all of the natural sciences; that is, we are calling on the scientific communities to take a step back and consider the role and value of ML in their fields; the (partial) answers we give here come from the particular perspective of physics.


Computational Natural Philosophy: A Thread from Presocratics through Turing to ChatGPT

Dodig-Crnkovic, Gordana

arXiv.org Artificial Intelligence

Modern computational natural philosophy conceptualizes the universe in terms of information and computation, establishing a framework for the study of cognition and intelligence. Despite some critiques, this computational perspective has significantly influenced our understanding of the natural world, leading to the development of AI systems like ChatGPT based on deep neural networks. Advancements in this domain have been facilitated by interdisciplinary research, integrating knowledge from multiple fields to simulate complex systems. Large Language Models (LLMs), such as ChatGPT, represent this approach's capabilities, utilizing reinforcement learning with human feedback (RLHF). Current research initiatives aim to integrate neural networks with symbolic computing, introducing a new generation of hybrid computational models.


DARWIN Series: Domain Specific Large Language Models for Natural Science

Xie, Tong, Wan, Yuwei, Huang, Wei, Yin, Zhenyu, Liu, Yixuan, Wang, Shaozhou, Linghu, Qingyuan, Kit, Chunyu, Grazian, Clara, Zhang, Wenjie, Razzak, Imran, Hoex, Bram

arXiv.org Artificial Intelligence

Emerging tools bring forth fresh approaches to work, and the field of natural science is no different. In natural science, traditional manual, serial, and labour-intensive work is being augmented by automated, parallel, and iterative processes driven by artificial intelligence-based experimental automation and more. To add new capabilities in natural science, enabling the acceleration and enrichment of automation of the discovery process, we present DARWIN, a series of tailored LLMs for natural science, mainly in physics, chemistry, and material science. This series relies on open-source LLM, incorporating structured and unstructured scientific knowledge from public datasets and literature. We fine-tuned the models using over 60,000 instruction data points, emphasizing factual correctness. During the fine-tuning, we introduce the Scientific Instruction Generation (SIG) model, automating instruction generation from scientific texts. This eliminates the need for manual extraction or domain-specific knowledge graphs and efficiently injects scientific knowledge into the model. We also explore multi-task training strategies, revealing interconnections between scientific tasks. DARWIN series not only achieves state-of-the-art results on various scientific tasks but also diminishes reliance on closed-source AI models. Our research showcases the ability of LLM in the scientific domain, with the overarching goal of fostering prosperity within the broader AI for science community.


Learn Data Science and Machine Learning with Python - Views Coupon

#artificialintelligence

Data science is an interdisciplinary branch of study that employs statistics, scientific computers, scientific techniques, procedures, algorithms, and systems to extract or infer information and insights from noisy, structured, and unstructured data. Data science also combines domain knowledge from the underlying application domain (e.g., natural sciences, information technology, health) (e.g., natural sciences, information technology, medicine). Data science has several facets and may be regarded as a science, a research paradigm, a research technique, a field, a workflow, and a career. Data science is a "concept that unifies statistics, data analysis, informatics, and their associated approaches" to "understand and analyse actual events" using data. It employs techniques and theories borrowed from several domains within the framework of mathematics, statistics, computer science, information science, and domain knowledge. Data science, however, is distinct from computer science and information science.


Artificial Intelligence: The National Network of High Schools that want to include this specialty in their programs is born

#artificialintelligence

The idea of including the topic of artificial intelligence in school curricula begins in the far north-east of Italy, specifically from the "Bunarrotti" secondary school in Monfalcone, where Dean Vincenzo Kaiko He also talked about creating a real network of schools that intend to offer educational courses to their students on this subject. Vincenzo Kaiko explains it Data science and artificial intelligence They are scientific disciplines closely related and related to other fields of knowledge such as mathematics, natural sciences, humanities, and economics, which together represent the most interesting frontier of new information and communication technologies. Integration of the study of data science and artificial intelligence into the high school track – Monfalcone School Principal adds It can allow male and female students to gain important basic knowledge in rapidly expanding fields of science and technology, both in terms of broadening their cultural background and in terms of orientation towards university studies. The study of these two disciplines also allows for logical development – mathematical skills, analytical and abstract skills, ability to solve problems and creativity, in an interdisciplinary and mutually enriching relationship both with mathematics, physics and the natural sciences, and with the humanistic disciplines." There are currently four Italian schools that have independently started secondary school curriculum studies with the aim of data science and artificial intelligence: these are Maserati High Schools in Foggera, Volta in Reggio Calabria and Galilei in Trento.


Changing the Nature of AI Research

Communications of the ACM

In many ways, we are living in quite a wondrous time for artificial intelligence (AI), with every week bringing some awe-inspiring feat in yet another tacit knowledge (https://bit.ly/3qYrAOY) Of particular recent interest are the large learned systems based on transformer architectures that are trained with billions of parameters over massive Web-scale multimodal corpora. Prominent examples include large language models (https://bit.ly/3iGdekA) The emergence of these large learned models is also changing the nature of AI research in fundamental ways. Just the other day, some researchers were playing with DALL-E and thought that it seems to have developed a secret language of its own (https://bit.ly/3ahH1Py)


How AI is helping the natural sciences

#artificialintelligence

The impact of climate change on Brazil's Atlantic coastline is a research focus at the University of São Paulo's machine-intelligence centre.Credit: Antonello Veneri/AFP via Getty Artificial intelligence (AI) is increasingly becoming a tool for researchers in other science and technology fields, forging collaborations across disciplines. Stanford University in California, which produces an index that tracks AI-related data, finds in its 2021 report that the number of AI journal publications grew by 34.5% from 2019 to 2020; up from 19.6% between 2018 and 2019 (see go.nature.com/3mdt2yq). AI publications represented 3.8% of all peer-reviewed scientific publications worldwide in 2019, up from 1.3% in 2011. Five AI researchers describe the fruits of these collaborations, beyond journal publications, and talk about how they are helping to break down barriers between disciplines. At the University of São Paulo in Brazil, where I lead the Center for Artificial Intelligence (C4AI), our main goal is to produce machine-intelligence research that has a direct impact on society and industry.